Reasoning with Noisy Semantic Data
نویسندگان
چکیده
Based on URIs, HTTP and RDF, the Linked Data project [3] aims to expose, share and connect related data from diverse sources on the Semantic Web. Linked Open Data (LOD) is a community effort to apply the Linked Data principles to data published under open licenses. With this effort, a large number of LOD datasets have been gathered in the LOD cloud, such as DBpedia, Freebase and FOAF profiles. These datasets are connected by links such as owl:sameAs. LOD has gained rapidly progressed and is still growing constantly. Until May 2009, there are 4.7 billion RDF triples and around 142 million RDF links [3]. After that, the total has been increased to 16 billion triples in March 2010 and another 14 billion triples have been published by the AIFB according to [17]. With the ever growing LOD datasets, one problem naturally arises, that is, the generation of the data may introduce noise, thus hinders the application of the data in practice. To make the Linked Data more useful, it is important to propose approaches for dealing with noise within the data. In [6], the authors classify noise in Linked Data into three main categories: accessibility and derefencability w.r.t. URI/HTTP, syntax errors, and noise and inconsistency w.r.t. reasoning. In our work, we focus on dealing with the third category of noise, namely noise and inconsistency w.r.t. reasoning, but may also consider other categories of noise. We further consider one more noise in the logical level, that is, the logical inconsistency caused by ontology mapping.
منابع مشابه
Noisy Semantic Data Processing in Seoul Road Sign Management System
The Seoul Road Sign Management (RSM) is a system which provides the semantic integration of LOD’s Linked Geo Data and Open Street Map with Korean POI data set. That is an attempt to develop intelligent road sign management system based on the LarKC platform. The RSM data set contains over 1.1 billion triples of semantic data. However, significant amount of the RSM data are noisy (e.g., inconsis...
متن کاملSemantic web technologies in pervasive computing: A survey and research roadmap
Pervasive and sensor-driven systems are by nature open and extensible, both in terms of input and tasks they are required to perform. Data streams coming from sensors are inherently noisy, imprecise and inaccurate, with di↵ering sampling rates and complex correlations with each other. These characteristics pose a significant challenge for traditional approaches to storing, representing, exchang...
متن کاملThe Symbiosis of Human and Semantic Technology Through the Lens of Actor-Network Theory
Background: Semantic technologies (STs) have made machine reasoning possible by providing intelligent data management methods. This capability has created new forms of interaction between humans and STs, which is called "semantic interaction." The increasing spread of this form of interaction in daily life reveals the need to identify the factors affecting it and introduce the requirements of...
متن کاملCan we ever catch up with the Web?
The Semantic Web is about to grow up. By efforts such as the Linking Open Data initiative, we finally find ourselves at the edge of a Web of Data becoming reality. Standards such as OWL 2, RIF and SPARQL 1.1 shall allow us to reason with and ask complex structured queries on this data, but still they do not play together smoothly and robustly enough to cope with huge amounts of noisy Web data. ...
متن کاملFinding Genes by Case-Based Reasoning in the Presence of Noisy Case Boundaries
Effectively using previous cases requires that a reasoner first match, in some fashion, the current problem against a large library of stored cases. One largely unaddressed task in case-based reasoning is the process of parsing continuous input into discrete cases. If this parsing is not done accurately, the relevant previous cases may not be found and the advantages of case-based problem solvi...
متن کاملAnatomy of a Semantic Virus
In this position paper, I discuss a piece of malicious automated software that can be used by an individual or a group of users for submitting valid random noisy RDF-based data based on predefined schemas/ontologies to Semantic search engines. The result will undermine the utility of semantic searches. I did not implement the whole virus, but checked its feasibility. The open question is whethe...
متن کامل